AITopics | asymptotic convergence-rate

The Asymptotic Convergence-Rate of Q-learning

Neural Information Processing SystemsApr-6-2023, 18:01:12 GMT

In this paper we show that for discounted MDPs with discount factor, 1/2 the asymptotic rate of convergence of Q-Iearning if R(1 -,) 1/2 and O( Jlog log tit) otherwise is O(1/tR (1-1') provided that the state-action pairs are sampled from a fixed prob(cid:173) ability distribution. Here R Pmin/Pmax is the ratio of the min(cid:173) imum and maximum state-action occupation frequencies. The re(cid:173) sults extend to convergent on-line learning provided that Pmin 0, where Pmin and Pmax now become the minimum and maximum state-action occupation frequencies corresponding to the station(cid:173) ary distribution.

asymptotic convergence-rate, maximum state-action occupation frequency, q-learning, (1 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.76)

Add feedback

The Asymptotic Convergence-Rate of Q-learning

Szepesvári, Csaba

Neural Information Processing SystemsDec-31-1998

Q-Iearning is a popular reinforcement learning (RL) algorithm whose convergence is well demonstrated in the literature (Jaakkola et al., 1994; Tsitsiklis, 1994; Littman and Szepesvari, 1996; Szepesvari and Littman, 1996). Our aim in this paper is to provide an upper bound for the convergence rate of (lookup-table based) Q-Iearning algorithms. Although, this upper bound is not strict, computer experiments (to be presented elsewhere) and the form of the lemma underlying the proof indicate that the obtained upper bound can be made strict by a slightly more complicated definition for R. Our results extend to learning on aggregated states (see (Singh et al., 1995» and other related algorithms which admit a certain form of asynchronous stochastic approximation (see (Szepesv iri and Littman, 1996». Present address: Associative Computing, Inc., Budapest, Konkoly Thege M. u. 29-33, HUNGARY-1121 The Asymptotic Convergence-Rate of Q-leaming

artificial intelligence, machine learning, reinforcement learning, (14 more...)

Neural Information Processing Systems

Country:

North America > United States > California > San Francisco County > San Francisco (0.30)
Europe > Hungary > Budapest > Budapest (0.24)
Europe > Hungary > Csongrád-Csanád County > Szeged (0.05)
(4 more...)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

The Asymptotic Convergence-Rate of Q-learning

Szepesvári, Csaba

Neural Information Processing SystemsDec-31-1998

Q-Iearning is a popular reinforcement learning (RL) algorithm whose convergence is well demonstrated in the literature (Jaakkola et al., 1994; Tsitsiklis, 1994; Littman and Szepesvari, 1996; Szepesvari and Littman, 1996). Our aim in this paper is to provide an upper bound for the convergence rate of (lookup-table based) Q-Iearning algorithms. Although, this upper bound is not strict, computer experiments (to be presented elsewhere) and the form of the lemma underlying the proof indicate that the obtained upper bound can be made strict by a slightly more complicated definition for R. Our results extend to learning on aggregated states (see (Singh et al., 1995» and other related algorithms which admit a certain form of asynchronous stochastic approximation (see (Szepesv iri and Littman, 1996». Present address: Associative Computing, Inc., Budapest, Konkoly Thege M. u. 29-33, HUNGARY-1121 The Asymptotic Convergence-Rate of Q-leaming

algorithm, asymptotic convergence-rate, convergence rate, (13 more...)

Neural Information Processing Systems

Country: